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SUMMARY 


The  practical  problem  of  appraising  the  accuracy  of  esti- 
mates--before  or  after  they  have  been  obtained--is  analysed. 

A procedure  called  decomposed  error  analysis  is  proposed, 
which  takes  quantified  assessments  of  different  kinds  of 
error,  such  as  random  sampling  fluctuations  and  mismeasure- 
ment,  and  synthesizes  them  into  a global  assessment  of  error. 
It  replaces  and  enlarges  classical  statistical  inference 
approaches  in  a personalist  format  which  does  not  depend  on 
Bayesian  updating.  Applications  from  the  private  and  public 
sector  are  presented. 
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ON  THE  CREDIBILITY  OF  ESTIMATES: 
ITS  EVALUATION  AND  IMPROVEMENT 


1.0  INTRODUCTION 


1.1  The  Problem 


Every  decision  maker,  whether  in  business,  government,  or 
some  other  type  of  organization,  relies  on  estimates  of  vari- 
ous kinds  as  a basis  for  resolving  practical  problems.  Public 
policy  and  opinion  are  commonly  based  on  mysteriously  precise 
estimates  of  quantities  whose  magnitude  cannot  conceivably  be 
known  with  any  but  the  vaguest  precision.  As  the  Time  Essay 
of  August  2,  1971,  "Of  Imaginary  Numbers,"  comments: 

From  solemn  public  officials  and  eager  corporations, 
from  newspapers,  television  (and  even,  some  dare  say, 
from  newsmagazines)  comes  a googol  of  seemingly  defin- 
itive and  unarguable  statistics.  They  tell  us,  with 
an  exactitude  that  appears  magical,  the  number  of  heroin 
addicts  in  New  York  and  the  population  of  the  world. 

By  simulating  reality,  they  assure  us  that  facts  are 
facts,  and  that  life  can  be  understood,  put  in  order, 
perhaps  even  mastered. 

If  this  sounds  fanciful,  consider  a few  specimens 
from  one  issues  of  the  New  York  Times  last  week: 

BANGKOK:  In  1965,  only  17%  of  the  people  in  north- 

eastern Thailand  were  within  a day's  journey  of  a 
main  road.  Today  the  figure  is  87%. 

NEVtf  YORK:  The  St.  Patrick's  Day  parade  cost  the  city 
$85,599.61,  where  Puerto  Rico  Day  cost  only  $74,169.44. 

ATLANTA:  There  are  1.4  million  illiterates  in  the 

U.S. 

KABUL:  Caravans  traveling  between  Afghanistan  and 

Pakistan  "commonly  carry  up  to  1,200  pounds  of  opium 
at  a time." 


If  every  statistic  were  regarded  with. .. skepticism, 
it  might  well  be  found  that  many  of  our  most  widely 
accepted  figures  are...,  at  least  in  part,  imaginary 
numbers.  The  national  rate  of  unemployment,  for 
example,  is  now  stated  to  be  5.6%,  but  that  figure 
is  based  entirely  on  people  who  officially  reported 
themselves  out  of  work.  Idle  students,  housewives 
who  cannot  find  outside  jobs,  unsuccessful  artisans-- 
such  people  are  not  counted.  Statistics  on  crime 
are  equally  uncertain,  since  they  mainly  reflect 
police  diligence  in  rounding  up  minor  offenders  and 
reporting  all  arrests. 


Such  estimates  as  noted  above  may  be  derived  from  formal 
research,  notably  by  sampling  or  counting,  from  direct  observa- 
tion, or  from  hunch  or  "feel."  Most  commonly  they  involve  a 
mixture  of  these  sources  of  fact  and  opinion.  However  they 
may  be  derived,  all  estimates  are  subject  to  varying  degrees 
of  error.  Thus,  the  decision  maker — and  the  staff  specialists 
who  assist  him — must  somehow  take  account  of  the  nature  and 
extent  of  the  errors  associated  with  any  estimate. 


As  far  as  we  are  aware,  no  serious,  or  at  least  wide- 
spread, effort  has  been  made  by  presumably  responsible  purveyors 
of  public  or  private  estimates  to  so  much  as  indicate  "credible 
limits"  on  their  estimates,  let  alone  to  seek  reasonable  grounds 
for  such  limits. 


The  reader  may  be  quick  to  point  out  that  ever  since 
sample  surveys  came  into  widespread  use,  beginning  in  the 
1920 's,  formal  methods,  notably  concerned  with  confidence 
intervals,  have  been  employed  to  appraise  errors.  However, 
they  have  almost  invariably  addressed  only  one  class  of  errors, 
those  arising  from  sampling  fluctuations.  Because  sampling 
errors  can  be  analyzed  readily  and  explicitly,  it  sometimes 
appears  that  researchers  treat  sampling  as  the  only  source 
of  errors  in  estimates.  Experienced  researchers  and  users 
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of  research  results  alike  know  that  in  most  cases,  "non- 
sampling"  errors  are  much  larger  than  pure  sampling  fluctu- 
ations. The  unwary  may  be  led  to  believe  that  estimates  are 
far  more  precise  than  they  actually  are. 

What  has  been  lacking  is  a systematic  method  for  analyzing 
total  error  in  estimates,  including  errors  arising  from  measure- 
ment and  other  sources  as  well  as  sampling  error.  This  paper 
describes  and  explains  one  approach  to  the  problem  of  evaluating 
total  error.  The  result  is  not  a complete,  tested  set  of  pro- 
cedures; but  it  may  be  a useful  step  toward  a very  important 
goal.  The  results  may  interest  managers,  analysts,  and  research 
specialists  in  a variety  of  fields. 

The  orientation  of  this  report  is  strictly  practical,  in 
the  sense  that  the  ultimate  beneficiary  is  intended  to  be  the 
man  of  affairs,  a decision  maker  who  may  use  the  fruits  of  the 
statistician's  labors  rather  than  the  statistician  himself 
(i.e.  the  orientation  is  to  contribute  to  the  technology  of 
administration  and  other  app>lied  arts). 

The  problem  of  analyzing  total  error  is  familiar  to 
anyone  who  has  to  make  decisions  in  the  face  of  uncertainty, 
and  consists  of  two  parts: 

1.  how  to  assess  uncertainty  about  relevant  target 
variables  (such  as  a market  share) , which  we  will 
call  the  problem  of  target  assessment ; 

2.  how  to  evaluate  ways  of  reducing  this  uncertainty, 
which  we  will  call  the  p>roblem  of  research  design. 
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1 . 2 Illustrations  of  Target  Assessment  and  Research  Design 
Problems 


When  a policy  maker  or  executive  looks  at  a completed 
piece  of  research  relating  to  some  target  variable,  such  as 
the  military  strength  of  a potential  adversary  for  a defense 
official,  or  the  demand  impact  of  an  advertising  campaign  for 
a businessman,  he  will  normally  have  two  questions  in  mind: 

What  should  his  own  "best"  estimate  be?  How  much  faith  should 
he  have  in  this  estimate? 

If  the  executive  is  not  interested  implicitly  in  questions 
along  these  lines,  then  it  is  not  at  all  clear  how  the  research 
can  have  a bearing  on  his  decision  making,  or  why  the  research 
was  undertaken  in  the  first  place.  (Organizational  prestige, 
the  relief  of  personal  anxiety,  or  the  desire  to  sell  a decision 
already  made,  are  not  unknown  motivations  for  research,  of  course! 
How  he  does  or  should  resolve  such  questions  is  open  to  question 
(but  not  arbitrary  choice) . He  may  adopt  as  his  own  best  estimate 
whatever  raw  number  (estimate)  is  thrown  up  by  the  research  (in  a 
business  setting,  if  5%  of  widget  users  surveyed  claim  to  use 
brand  X,  5%  would  be  his  estimate  of  the  national  brand  share) . 
Alternatively,  he  may  want  to  adjust  that  raw  estimate  in  the 
light  of  any  prior  views  he  may  have  of  the  research  technique 
used  or  the  target  variable  itself. 

As  far  as  the  executive's  faith  in  his  best  estimate  is 
concerned,  he  may  treat  the  estimate  as  a certainty  in  his 
subsequent  thinking;  or,  he  may  use  some  "objective"  statis- 
tical procedure  to  set  a "confidence  interval";  or,  he  may 
somehow  take  account  of  his  personal  judgment  in  assigning 
a margin  of  error  or,  as  it  is  technically  called,  a credible 
interval . 
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Intuitively,  hard-headed  administrators  make  their  own 
assessments  and  adjustments  all  the  time,  without  recourse 
to  a theoretician.  A business  executive,  for  example,  would 
be  quite  likely  to  make  an  informal  research  a;  -raisal  of 
the  following  kind:  "This  report  says  our  company  has  5* 
of  the  widget  market.  Ridiculous!  We  are  selling  5,000  a 
week  and  the  total  market  cannot  be  more  than  50,000.  Prob- 
ably some  of  our  customers  in  the  survey  said  they  bought 
the  competitor's  brand  because  he  advertises  more.  I would 
up  that  estimate  to  10%  give  or  take  a few  percent." 

Defense  officials  evaluating  intelligence  reports  will 
often,  and  with  good  reason,  make  similar  responses.  Estimates 
of  interest  might  include  the  throw  weight  of  a Soviet  missile, 
the  number  of  Soviet  troops  stationed  in  Poland,  the  proportion 
of  Soviet  aircraft  equipped  with  certain  advanced  fire  control 
systems,  or  the  number  of  new  Soviet  tanks  in  East  Germany. 

1 . 3 The  Need  for  Formal  Aids 

Now  it  is  quite  possible  for  an  executive  to  do  a per- 
fectly good  job  of  combining  survey  evidence  with  his  experi- 
ence in  making  such  an  appraisal  by  using  no  more  than  his 
informal  common  sense.  On  the  other  hand,  he  may  welcome 
some  formal  assistance  in  weighing  the  evidence. 

A realistic  appraisal  of  the  accuracy  of  an  estimate 
will  clearly  help  a decision  maker  to  use  that  estimate 
effectively.  It  may  also  provide  a useful  stimulus  to 
improving  the  estimating  process  itself.  It  is  a familiar 
phenomenon  that  appropriate  measures  of  effectiveness  for  any 
task  (like  estimation)  tend  to  improve  the  performance  of  that 
task.  Who  can  doubt  that  Neilsen  ratings  have  had  the  effect 


> 


5 


I 


» 


(however  deplorable)  of  moving  TV  prograns  in  a direction 
which  maximizes  the  number  of  sets  turned  on  (which,  of  course, 
is  what  Neilsen  measures)?  The  fact  that  the  accuracy  of 
election  polls  can  be  checked  quickly  and  surely  no  doubt 
accounts,  in  large  measure,  for  the  high  degree  of  accuracy 
of  such  polls.  If  we  can  measure,  however  tentatively,  the 
accuracy  of  estimates  used  in  public  or  private  sectors, 
perhaps  this  will  put  effective  pressure  on  the  researcher 
(estimator)  to  make  his  estimates  more  accurate. 

In  defense  and  related  areas,  as  in  business,  quantita- 
tive research  is  almost  invariably  action  oriented,  and  the 
case  for  a meaningful  and  comprehensive  evaluation  tool 
addressing  user  interest  becomes  irresistible.  The  case  can 
perhaps  be  made  (suspect,  in  my  opinion)  that  scientific 
research  should  only  be  reported  in  classical  terms,  i.e. 
restricting  attention  to  objectively  measurable  sources  of 
error,  like  random  sampling.  Surely  no  case  can  be  made  for 
so  restricting  the  appraisal  of  estimates  on  which  national 
policy  may  be  based. 

In  current  practice  in  military  intelligence,  classical 
validation  tests  are  in  fact  used  in  only  a small  proportion 
of  cases  involving  quantitative  estimates.  Such  tests  are 
limited  by  the  number  of  people  qualified  to  apply  them  and 
are  normally  performed  only  by  scientific  specialists.  In- 
telligence analysts  typically  are  not  trained  in  these  methods. 
They  may  make  statements  of  the  form  "such  and  such  Warsaw 
Pact  Division  has  8,500  men  in  it  plus  or  minus  10%. " Such 
an  assessment  would  take  into  account  all  the  considerations 
the  analyst  thought  relevant,  but  it  would  be  presented  without 
formal  validation  for  the  latter  interval  or  an  indication  of 
how  probable  it  is  that  the  true  number  lies  within  that  range. 
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Because  of  the  requirement  (actual  or  perceived)  that 
quantitative  estimates  be  publicly  documented,  it  is  not 
uncommon  for  private  estimates  made  by  analysts  to  differ  from 
those  made  public.  The  latter  may  only  consist  of  elements 
that  can  be  firmly  defended,  and  the  former  may  include  richer 
but  more  diffuse  and  less  readily  verified  and  validated 
data  in  which  they  nonetheless  have  more  confidence.  It  is 
not  apparent  whether  any  consistent  bias  exists  between  the 
two.  However,  the  private,  more  realistic  estimate  will 
typically  be  hedged  by  a larger  margin  of  uncertainty  than 
the  public  estimate. 

Outside  of  the  military,  other  government  agencies  also 
engage  in  making  estimates  and  designing  surveys.  For  example, 
the  Federal  Energy  Administration  is  currently  concerned  with 
how  to  specify  data-gathering  projects  in  order  to  produce  the 
most  credible  estimates,  allowing  for  biases  and  other  sources 
of  error  in  estimates  received.  Such  surveys  will  seek  esti- 
mates of: 

o financial  and  other  operations  of  oil  companies; 

o the  availability  of  natural  gas  supplies  at  peak 

demand  times;  and  , ] 

o demand  patterns  of  motorists  and  other  energy 
consumers . 

Finally,  the  need  for  user-oriented  appraisal  of  esti- 
mating strategies  and  estimates  is  nowhere  more  evident  than 
in  the  social  and  natural  sciences,  whose  empirical  core  is 
based  on  experiments  and  other  sample  inquiries.  The  con- 
ventional scientific  validation  procedures  of  classical 
statistics,  such  as  specification  of  confidence  intervals 
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and  tests  of  significance,  are  partial  and  typically  confusing 
measures  from  the  user's  point  of  view,  useful  as  they  may  be 
for  standarized  documentation  of  reported  experiments  (see 
Section  2.1  below). 

It  would  be  useful,  therefore,  for  government  agencies 
which  carry  on  a significant  amount  of  social  science  research 
to  have  a more  complete  and  less  confusing  method  of  appraising 
estimating  strategies  and  estimates. 

The  techniques  discussed  in  this  paper  have  been  used 
in  particular  by  the  Federal  Energy  Administration  to  estimate 
conservation  behavior  of  households  and  the  market  for  solar 
heating  devices  in  the  home  as  presented  in  Brown  et  al.  (1977) 
and  Campbell  et  al.  (1977) . A fuller  development  appears  in 
Brown  (1969) . 


2.0  CURRENT  STATE  OF  THE  ART 


2 . 1 Classical  Inference  Techniques 

Of  course,  the  technical  literature  abounds  with  pro- 
cedures that  appear  to  address  problems  such  as  those  mentioned 
above.  When  sample  findings  are  available,  for  example,  common 
devices  such  as  confidence  intervals,  maximum  likelihood  esti- 
mates, and  tests  of  significance  certainly  seem  to  be  saying 
something  about  what  we  call  target  assessment.  However,  the 
trouble  with  classical  inference  tools  such  as  these  is  that 
their  output  is  not  in  a form  that  is  of  direct  interest  to  a 
decision  maker.  He  wants  to  answer  the  very  personal  question, 
"Where  does  m^  target  variable  probably  lie?",  whereas  a 
confidence  interval,  for  example,  is  telling  him  how  surprising 
the  observed  sample  would  be  if  some  variable  (not  necessarily 
his  target  variable)  had  some  hypothetical  values.  This  simply 
is  not  answering  any  question  a typical  executive  would  want  to 


For  example,  the  confidence  interval  says  something  very 
difficult  for  the  layman  to  interpret  (or  use,  if  he  can 
interpret  it),  as  follows.  "Intervals  calculated  as  this  one 
was  from  repeated  samples  will  include  the  true  value  95%  of 
the  time."  A special,  but  common,  case  of  a 95%  confidence 
interval  is  computed  as  follows:  The  lower  limit  is  selected 
such  that,  if  it  were  the  true  value  of  the  target  variable, 
repeated  sampling  would  produce  a research  statistic  larger 
than  that  actually  obtained  2-1/2%  of  the  time--and  conversely 
for  the  upper  limit.  Figure  2-1  gives  a graphic  illustration, 
based  on  a simple  random  sample  of  nine  hundred,  of  which 
ninety  showed  the  property  in  question.  (The  population  from 
which  it  is  drawn  is  effectively  infinite.) 


9 


')  I 


! 

Where  the  target  variable  is  a fraction  r,  for  example, 
and  a sample  of  size  n produces  a fraction  p,  the  95%  con- 
fidence limits  are  approximated  by 

1 

p t 2 (p  (1  - p ) / »: ) 1/2 


and  this  formula  is  in  very  common  use. 


I 

» 


Substantial  literature  has  been  developed  on  the  consider- 
able variety  of  confidence  interval  techniques  in  use.  However, 
all  of  them  partake  of  the  same  general  character  already 
discussed  and  which  will  be  developed  next.  The  differences 
are  not  critical  to  this  inquiry. 

2 . 2 Classical  Inference  Applied  to  Camford  Case 

In  the  real  study  on  which  the  Camford  example  is  based, 
classical  inference  was  attempted  in  a way  which  is  very  typical 
in  survey  estimate  appraisal.  In  the  sample  of  nine  hundred 
locally  registered  car  owners,  it  will  be  recalled  that  ninety, 
or  10%  of  the  sample,  reported  that  they  would  park  at  peak 
hours  on  the  given  days,  if  meters  were  introduced.  Approximate 
95%  confidence  limits  computed  according  to  the  formula  just 
presented  are 

.1  i 2 ( . 1 X .9/900) 1 " or  8%  to  12% 

which  is  what  appeared  in  an  Appendix  to  the  original  Camford 
report..  The  exact  limits,  computed  by  a computer  program  are 
8.12%  to  12.15%.  Figure  2-1  shows  how  these  inferences  are 
built  up:  10%  is  a "maximum  likelihood"  estimate,  in  the 
sense  that  10%  is  more  likely  to  be  the  sample  value,  if  10% 
were  the  true  fraction  in  the  sampled  population,  than  if  the 
population  had  any  other  fraction. 
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The  three  curves  are  sampling  distributions  showing  the 
probability  of  obtaining  any  particular  sample  fraction  given 
the  true  fraction  of  population  being  sampled  (not  necessarily 
the  population  of  interest).  If  the  lower  limit  A were  the 
true  fraction,  repeated  samplings  would  produce  a fraction 
larger  than  that  actually  obtained  2-1/2%  of  the  time.  The 
reverse  is  true  for  C,  the  upper  limit. 

Now  at  first  sight,  it  might  appear  that  the  target 
assessment  questions  posed  earlier  have  been  answered.  Indeed, 
a large  fraction  of  the  countless  users  of  confidence  limits 
would  have  the  impression  that: 

(1)  10%  is  the  "best"  single  estimate  for  the  true 
proportion  of  "metered  parkers"  in  the  frame  of 
local  motorists  sampled; 

(2)  it  is  reasonable  to  assign  about  95%  probability 
to  the  true  proportion  lying  between  8%  and  12%. 

In  general,  neither  interpretation  can  even  approximately 
be  supported  (see  Brown  1969,  pages  73-82). 

When  an  executive  considers  the  research  design  (as 
opposed  to  target  assessment)  problem,  he  wants  to  answer  a 
question  like  "What  research  can  I do  which  will  make  me  least 
uncertain?"  It  would  never  occur  to  him,  and  rightly  so,  to 
ask,  "What  research  will  produce  the  smallest  sampling 
variance  from  among  those  research  designs  for  which  a sampling 
variance  can  be  objectively  calculated?"  The  latter  is  the 
kind  of  information  he  might  extract  from  the  currently  dominant 
tools  of  classical  inference.^ 


A more  general  and  technical  discussion  of  the  weakness  of 
classical  Inference  for  decision-making  purposes  appears 
in  Pratt  et  al . (1965),  Chapter  20. 
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Instead,  when  assessing  his  target  variable,  the  decision 
maker  surely  wants  to  come  up  with  a personal  probability 
assessment,  possibly  in  the  detailed  form  shown  in  Figure  2-2. 
In  most  cases,  he  will  be  satisfied  with  a simple  summary  of 
the  distribution,  say,  as  an  interval  within  which  he  is  95% 
sure  the  target  variable  really  lies--in  this  case  300  to 
2200--or  possibly  just  his  expectation--in  this  case,  1100. 

Similarly,  when  choosing  among  research  designs,  he  will 
want  to  look  ahead  to  the  kind  of  probabilistic  assessment  he 
can  expect  to  make  after  the  research.  Presumably,  he  will  opt 
for  the  design  which,  in  some  sense,  promises  to  produce  a 
personal  probability  distribution  with  as  little  "spread"  as 
possible. 

2 . 3 Personalist  Decision  Analysis 

A new  branch  of  statistics  known  as  personalist  decision 

analysis  (PDA)  is  available  to  handle  personal  decision  and 

2 

inference  problems  of  this  kind.  Specific  variants  known  as 
Bayesian  probability  updating  and  preposterior  analysis  have 
been  substantially  developed  to  address  exactly  these  problems. 
However,  even  though  military  analysts  and  other  staff  people 

4 

have  been  exploring  the  applications  of  these  specific  tools, 
they  have  been  slow  to  take  hold  among  real-world  decision 
makers.  A survey  of  business  applications  of  PDA5  found  very 
few  instances  where  executives  acted  on  the  implications  of 
such  analyses  (though  plentiful  use  of  other  variants  of 
PDA,  notably  decision  trees,  was  reported). 


See  Savage  (1972),  Raiffa  & Schlaifer  (1961). 
See  Brown  et  al.  (1974),  Chapters  35,  26. 

See  Barclay  et  al . (1977),  Chapters  4 and  5. 
See  Brown  et  al . (1974),  Chapter  7. 
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PERSONAL  PROBABILITY  FOR  TARGET  VARIABLE 
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No  doubt,  part  of  this  lack  of  implementation  is  due 
to  the  quite  natural  lag  between  a new  technology's  develop- 
ment at  a theoretical  level  and  its  becoming  operational. 

But,  part  may  be  because  the  technique  itself,  as  currently 
developed,  is  not  always  appropriate  for  use  by  nontechnical 
decision  makers  or  executives. 

In  particular,  the  technique  involves  Bayes'  Theorem  and 
requires  the  executive  to  participate  in  the  assessment  of 
esoteric  inputs  (e.g.  "prior  distributions"  and  "likelihood 
functions")  which  his  training  typically  does  not  equip  him 
to  supply  or  even  to  understand. 

Moreover,  very  few  executives  feel  that  they  understand 
even  the  general  purpose  of  these  devices.  For  this  reason, 
they  are  understandably  hesitant  to  trust  decisions  that  may 
involve  millions  of  dollars  of  private  and  public  resources 
to  an  analysis  based  on  an  arcane  logic. 

Is  there  any  way  of  avoiding  these  drawbacks?  We  feel 
there  is  and  propose  an  alternative  which,  while  it  is  person- 
al istic  in  the  sense  that,  it  accepts  personal  inputs  and  its 
output  is  interpreted  personally  (like  the  tools  just  mentioned,) 
it  does  not  depend  on  Bayes'  Theorem  (which  they  typically  do) 
and  hopefully  avoids  some  of  its  drawbacks. 
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3. 0 A SUGGESTED  APPROACH 

3 . 1 Decomposed  Variable  Analysis  (DVA) 

The  decomposed  variable  analysis^  technique  depends  not 
on  Bayes'  Theorem,  but  on  the  equally  well-known  logic  of  the 
distribution  of  functions  of  random  variables.  In  the  special 
version  to  be  presented,  it  can  be  used  both  for  problems  of 
target  assessment  and  of  research  design. 

The  essential  steps  are  very  simple  and  are  as  follows: 

1.  The  target  variable  is  decomposed , in  the  sense  that 
it  is  expressed  as  a function  of  two  or  more  compo- 
nents. A very  simple  example  would  be  to  express 
future  demand  for  energy  as  energy  per  consumer 
times  number  of  consumers.  A slightly  more  elaborate 
decomposition  (and  decompositions  can  get  very 
elaborate)  would  be  to  express  energy  as  the  sum  of 
multiplicative  expressions  of  the  above  form  for 
each  of  a number  of  use  sectors,  such  as  lighting, 
heating,  transportation,  etc. 

2.  Each  component  thus  defined  is  assessed  probabi- 
listically (e.g.  in  the  form  of  a personal  probability 
distribution)  on  the  basis  of  whatever  evidence  is 
available  to  the  assessor.  This  evidence  could 
include  field  work,  judgment,  or  published  statistics, 
and  the  supporting  reasoning  could  be  any  combination 
of  intuition  and  statistical  theory  (including  pos- 
sibly, Bayesian  probability  updating) . 


Also  referred  to  as  "credence  decomposition,"  e.g.  in 
Brown  (1969). 
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3. 


in  the 


A personal  probability  distribution  (e.g., 
form  of  Figure  2-2)  is  derived  routinely  by  standard 
statistical  procedure  from  the  component  distribu- 
tions and  the  decomposition  formula  by  which  they 
are  combined.  Computer  programs,  mathematical 
formulas,  and  other  supporting  devices  have  been 
developed  to  make  this  processing  as  painless  as 
possible. 

For  example,  when  a target  variable,  t,  is  decomposed  into 
a product  of  components  (for  example,  t = x*y-z),  the  required 
distribution  for  t can  be  approximated  from  assessed  distribu- 
tions for  the  components  x,  y,  and  z as  follows. 

Let  the  mean  of  t be  represented  as  E(t.)  , and  let  the 
95?  credible  span  of  t be  represented  as  C(t).  (The  assessor 
assigns  a 95%  probability  that  the  target  variable  lies 
within  the  range  specified  as  the  95%  credible  span.)  Assess 
for  each  component  the  mean  (E(x),  and  so  on)  and  credible 
span  (C (x) , and  so  on).  Then,  if  the  components  are  judg- 
mentally  independent  of  one  another,  or  nearly  so,  the  follow 
approximations  hold.^ 

E(t)  = E (x)  X E (y)  X E(z) 


C(t) 


E (t) 


C (x)  2 . C (y)  2 

x 1 

A ; E (x ) E (y j 


C ( z ) 
E (z) 


2 
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For  example,  suppose  the  "min,"  mean,  and  "max"  (note 
that  "min"  and  "max"  refer  to  the  edges  of  a 95%  credible 


7 


As  explained  in  Brown  (1969),  Chapter  9. 


interval,  not  absolute  limits)  of  the  three  components  are 
assessed  to  be: 


5,10,15 
10,60,120 
. 1 , . 5 , . 8 


Applying  the  above  formulas  will  give 


E (t)  = 10  X 60  X .5  = 300 


10"  . 110"  . .7' 

+ n + 1 


10  60"  .5 


=300  1 + 3.36  + 1.96 

= 300  5.32 

= 692 

If  the  distribution  of  t were  symmetrical , the  "min," 
mean,  and  "max"  would,  of  course,  be  given  by  300  4 346.  We 
can  usually  get  a better  approximation  to  the  edges  of  the 
credible  interval  by  assuming  "log-symmetry,"  which  implies 
that  the  "upper  edge"  divided  by  the  mean  equals  the  mean 
divided  by  the  "lower  edge."  This  is  still  only  an  approxi- 
mation, however,  and  is  arithmetically  bothersome.  (The  exact 
determination  would  require  more  detailed  input  and  statistical 
theory.)  One  can  do  just  about  as  well  by  locatina  the  cred- 
ible interval  by  eye,  viz.: 

t:  120,  300,  812 

(Note  that  812  - 120  = 692.) 
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The  above  routines  are  approximations,  but  adequate  as 
a first  pass  for  many  real  problems.  (A  method  for  obtaining 
greater  precision  and  generality,  for  example  by  using  simula- 
tion, is  discussed  in  Section  9.2  of  Brown  (1969).  Specific 
computer  programs  have  been  developed  for  this  purpose  at 
Decisions  and  Designs,  Incorporated.) 

3 . 2 Decomposed  Error  Analysis 

At  this  level  of  generality,  decomposed  variable  assess- 
ment is  a rather  trivial  (if  grossly  under-exploited!)  tool. 

However,  there  is  a variant  of  DVA,  decomposed  error  analysis 
(DEA) , which  is  less  obvious  and  which  seems  to  lend  itself 
rather  conveniently  to  problems  of  target  assessment  and 
research  design. 

In  DEA,  what  is  decomposed  is  not  the  target  variable  of 
ultimate  interest  to  the  executive  but  rather  the  estimating 
error  resulting  from  a specific  piece  of  quantitative  research, 
such  as  a sample  survey. 

There  are  at  least  two  ways  of  formally  defining  estimating 
error.  It  can  be  defined  as  the  difference  between  the  target 
variable  and  some  more  or  less  arbitrary  estimate  calculated 
from  the  research  findings.  Alternatively,  it  can  be  the  ratio 
of  the  target  variable  to  such  an  estimate.  Either  formulation 
has  advantages  in  particular  circumstances,  though  for  illus- 
trative purposes  only  the  error  ratio  will  be  discussed. 

I 

3.2.1  An  urban  planning  example  - The  British  town  of 
Camford  had  a mail  survey  done  to  assess  the  probable  demand 
for  parking  space  if  meters  were  introduced.  A list  of  ten 
thousand  locally  registered  motorists  was  obtained,  and  one 
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thousand  were  randomly  selected  and  sent  questionnaires. 

Nine  hundred  returned  the  questionnaires  and,  of  these,  10% 
or  ninety,  indicated  that  if  meters  were  introduced,  they 
would  be  parked  in  the  downtown  area  at  a given  peak  hour. 

The  city  engineer's  target  assessment  problem  is  what 
to  conclude  about  the  actual  demand  for  parking  space  if  meters 
were  introduced,  expressed  in  a form  like  Figure  2-2.  He  also 
has  a research  design  problem.  If  he  conducts  a new  survey  in 
another  town,  should  he  use  the  same  budget  on  another  mail 
survey  or  rely  on  a smaller  personal  survey? 

3.2.2  PEA  for  target  assessment  - On  the  target  assess- 
ment problem,  the  first  thing  the  city  engineer  might  do  using 
decomposed  variable  analysis  would  be  to  decompose  the  total 
spaces  needed  (if  meters  were  introduced)  into  the  product  of: 

1.  the  fraction  of  local  motorists  needing  space 
(t)  ; 

2.  the  number  of  local  motorists  (n) ; and 

3.  some  adjustment  factor  intended  to  allow  for 
spaces  needed  by  out-of- town  parkers  (f) . 

If  probabilistic  assessments  can  be  made  for  each  of 
these,  a probability  distribution  on  the  target  variable  can  be 
derived  routinely.  The  number  of  local  motorists  (n)  is  known 
to  be  ten  thousand,  so  no  probabilistic  assessment  is  needed  of 
that  component.  The  out-of-town  adjustment  component  (f)  can 
be  assessed  informally  by  direct  intuition.  This  leaves  the 
"local  fraction"  (t) , the  variable  which  the  mail  survey 
addresses.  The  city  engineer  may  have  more  misgivings  about 
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informally  assessing  a probability  distribution  on  this  variable, 
so  he  might  decide  to  p-  ‘ >rm  an  error  assessment  of  (t)  . 

Figure  3-1  shows  the  essential  steps  the  assessor 
might  go  through  in  order  to  express  total  error  ratio  as  a 
function  of  component  ratios  which  reflect  distinguishable  (and 
assessable)  sources  of  error.  The  nested  rings  at  the  top  of 
the  figure  and  the  vertical  lines  indicate  the  various  ways  in 
which  sources  of  error  can  enter  between  the  true  value  of  the 
target,  t (the  "local  fraction"),  and  the  estimate,  a'  (known 
to  be  10%) . 

Thus,  t/a'  is  the  total  error  ratio,  and  the  compo- 
nent error  ratios  are  defined  in  the  line  in  Figure  3-1  marked 
"Decomposition."  It  can  be  seen  that  three  sources  of  error 
are  distinguishable:  random  error,  nonresponse  error,  and 
reporting  error.  It  can  easily  be  verified  that  each  error 
ratio  will  equal  one  if  there  is  no  error  of  that  type  involved. 
The  set  of  boxes  on  the  right  hand  side  of  the  bottom  line  of 
Figure  3-1  summarizes,  in  the  form  of  95%  credible  intervals, 
probabilistic  assessments  that  were  made  for  each  of  three 
component  error  ratios.  The  detail  of  these  assessments  and 
the  logic  behind  them  are  described  in  Brown  (1969) , 

Chapter  8. 


The  expectation  and  credible  interval  for  the  total 
error  ratio  t/a'  are  calculated  from  the  component  error  assess- 
ments via  the  product  decomposition  formula  given  in  Section  3.1 
above  as  1.08  and  .6  to  2.1  respectively.  Multiplying  t/a'  by 
a'  ( = .l)  yields  an  expectation  for  the  target  fraction  of  10.8% 
and  a 95%  credible  interval  of  6%  to  21%,  as  shown  in  the  left 
hand  boxes  of  Figure  3-1. 
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note:  The  (op  diagram  defines  t he  quantities  in  the  Krror  Ratio  Deeom- 
position  below  it.  The  numerical  input  and  output  appear  in  the 
lower  boxes  below  the  corresponding  elements  m the  decomposition. 
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ASSESSING  A POPULATION  FRACTION  FROM  A 
SURVEY  ESTIMATE  USING  DECOMPOSED  ERROR  ANALYSIS 
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The  city  engineer  might  thus  conclude,  if  he  accepts 
the  input  assessments,  that  he  can  be  95%  certain  that  the 
local  fraction  t lies  between  6%  and  21%,  with  an  expectation 
of  10.8%.  Conjoined  with  the  knowledge  that  there  are  ten 
thousand  local  motorists  and  an  assessment  of  the  "out-of-town 
adjustment"  with  a credible  interval  of  1 to  1.2,  a distribu- 
tion on  the  real  target  variable,  total  spaces  needed,  was 
derived  and  is  displayed  in  Figure  2-2.  Therefore,  his  tar- 
get assessment  is  that,  with  95%  personal  probability, 
between  300  and  2200  parking  spaces  will  be  needed  if  meters 


are  introduced. 


3.2.3  PEA  for  research  design  - For  the  research  design 
problem,  the  city  engineer  would  go  through  virtually  the  same 
procedure  for  each  of  the  alternative  research  designs  considered. 
If  the  cost  is  the  same,  he  might  reasonably  choose  whichever 
strategy  leads  to  least  uncertainty,  as  measured,  say,  by  the 
span  of  the  credible  interval  on  the  total  error  ratio.  Alter- 
native research  design  criteria  can  be  selected,  such  as  prior 
expectation  of  posterior  variance,  but  they  seem  to  produce 
almost  identical  rankings. 


It  is  possible  that  the  most  important  applications 
of  DEA  will  be  not  in  appraising  research  estimates  after  the 
fact,  but  rather  in  choosing  research  strategies  from  which 
estimates  will  emerge. 


The  following  research  design  applications  of  DEA  are 
examples  drawn  from  the  author's  experience.  Although  the  con- 
texts are  largely  business  oriented  (other  than  the  last),  anal- 
ogies with  research  design  problems  in  other  areas  can  readily 


be  made. 
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1.  To  estimate  brand  sales,  should  a consumer 
panel  be  used  to  estimate  sales  per  consumer, 
or  a store  audit  performed  to  estimate  sales 
per  store?  In  either  case,  total  error  depends 
on  errors  in  estimating  the  size  of  the  popula- 
tion (consumers  or  stores)  and  in  estimating 
sales  per  unit.  Decomposition  and  four  separate 
error  assessments  helped  decide  that,  in  a 
particular  instance,  a store  audit  promised  the 
least  serious  combination  of  errors. 

2.  To  estimate  annual  replacement  demand  for  shock 
absorbers,  how  should  the  decomposition  be 
formulated?  Should  it  be  vehicles  in  circula- 
tion times  annual  replacement  rate  (estimated 
from  information  from  vehicle  manufacturers 
and  motorist  interviews)?  Or,  should  it  be 
the  product  of  the  number  of  garages  times  the 
average  replacement  sales  per  garage  (requiring 
a garage  survey)?  Or,  should  both  approaches 
be  used  and,  if  so,  in  what  proportion?  (The 
latter  strategy  was  selected  with  the  main 
emphasis  on  the  second  approach.) 

3.  Which  of  several  sampling  lists  should  be  used? 
(The  less  complete  list  may  contain  classifying 
information  which  permits  a more  efficient  sample 
design,  but  the  omissions  may  have  important 
distinguishing  features.) 

4.  Should  random  or  quota  sampling  be  used  to 
estimate  family  savings  patterns?  (Random 
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sampling  promises  a more  representative 
sample,  but  quota  sampling  perhaps  promises 
more  believable  respondents.) 

5.  To  estimate  the  number  of  welding  sets  in  use, 
should  a simple  random  sample  of  industrial 
companies  be  preferred  over  a judgmentally 
skewed  sample  which  favors  large  companies? 

6.  To  establish  how  many  automotive  parts  were 
purchased  by  a vehicle  fleet  operator,  *ould 
you  ask  him  or  sample  his  maintenance  records? 
(This  permits  a t.rade-off  between  convenience 
and  accuracy.) 

7.  If  several  different  estimates  of  the  same 
target  are  available  based  on  different  sources 
of  information,  how  should  the  data  be  pooled? 

8.  What  is  the  right  economic  balance  of  research 
resources  between  gathering  data  and  analyzing 
it? 

9.  In  estimating  energy  demand,  should  many 
converging  approaches  be  used  and  the  results 
pooled  or  should  all  available  resources  be 
devoted  to  a sinale  estimating  approach?  DEA 
suggested  the  former. 
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4.0  CONCLUSION? 


4 . 1 How  Can  the  Policy  Maker  Use  PI  A? 

The  patient  reader  who  has  borne  with  us  thus  far  and 
has  been  persuaded  to  try  decomposed  variable  analysis  on 
research  design  or  on  target  assessment  problems  may  wonder 
how,  precisely,  to  proceed. 

It  is  unlikely  that  the  typical  executive  needs  to 
involve  himself  in  much  more  detail  than  is  covered  in  this 
report  provided  he  understands  very  clearly  the  input  (assess- 
ments) and  the  output  (conclusions)  of  the  analysis.  He 
may,  however,  wish  to  delegate  the  detail  and/or  confer  with 
someone  experienced  in  using  the  technique.  In  our  experience, 
the  greatest  dangers  inherent  in  any  type  of  formal  approach 
to  executive  problems,  including  operations  research  and 
other  approaches  in  common  use,  are  that  the  problem  solved 
is  different  from  the  problem  the  executive  has  and  that 
assumptions  underlying  the  analysis  are  unacceptable  to  the 
executive  (although  he  may  not  be  aware  of  the  assumptions). 
These  are  particularly  serious  complaints  against  conventional 
uses  of  statistics  for  research  appraisal. 

This  is  not  to  say  that  decomposed  variable  analysis 
invariably  needs  the  participation  of  a technical  specialist. 

If  the  decomposition  of  the  target  variable  goes  no  further 
than  a few  intervening  variables  without  explicit  error  de- 
composition for  any  one  of  them,  DVA  can  be  quick  and  trouble- 
free  even  for  the  layman. 

Suppose  an  executive  requires  a quick  but  reasonable 
probabilistic  assessment  of  some  quantity  of  interest,  and 
his  information  and  judgment  are  based  on  varied  and  diffuse 
sources.  He  could  decompose  this  target  into  a few  components 


on  which  his  experience  independently  bears,  making  direct 
intuitive  assessments  of  each  and  using  a simple  formula  (or 
computer  program)  to  process  them. 


More  specifically,  suppose  the  target  variable  is  the 
demand  for  gasoline  at  $1  a gallon  in  two  years  time.  It 
can  be  decomposed  as  the  product  of : 


1 . how 

2 . the 
the 

J . the 
and 

4 . the 


many  motorists  there  are  now; 

rate  of  growth  of  the  motorist  population  over 
next  year; 

individual  average  mileage  of  the  motorists; 

average  consumption  of  gas  per  mile. 


The  executive  has  then  only  to  think  about  these  components 
in  turn  and  judgmentally  assign  an  expected  value  and  95? 
credible  interval  to  each.  By  applyinq  a simple  arithmetical 

g 

procedure,  a "best  forecast"  and  a credible  interval  for 
the  target  are  quickly  obtained. 


4 . 2 The  Appraisal  Tool  Appraised 

While  the  approach  proposed  here  may  not  be  the  best 
that  can  be  devised,  it  does  appear  to  offer  substantial 
advantages  to  the  research  user  over  any  of  the  alternatives 
he  currently  has  at  his  disposal.  Any  moderately  good 
appraisal  technique  which  takes  account  of  all  major  sources 
of  error  and  which  gets  used  is  an  improvement  by  an  order 
of  magnitude  over  current  practice. 
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See  Section  3.1  above. 


As  George  Kennan  has  said  in  a slightly  different 
context,  "Tentative  solutions  to  major  problems  are  worth 
more  than  definitive  solutions  to  trivial  problems."  A 
major  object  of  this  paper  will  have  been  achieved  if  research 
users  are  encouraged  to  press  for  at  least  tentative  solutions 
to  the  major  appraisal  problems  they  face  and  to  be  a little 
more  suspicious  of  definitive  but  trivial  appraisals  which 
they  all  too  commonly  receive. 

While  decision  makers  concerned  with  clarifying  their 
own  uncertainities  will  surely  support  any  move  in  the 
direction  of  realistic  target  assessments  (especially  if 
quick  and  cheap) , resistance  to  progress  can  be  expected 
from  two  quarters:  researchers  whose  work  will  come  under 
more  stringent  scrutiny  and  research  commissioners  who  have 
an  interest  in  "proving"  something  to  third  parties  (for 
example,  that  their  magazine  penetrates  markets  attractive 
to  advertisers). 

It  is  up  to  the  ultimate  research  user--the  executive — 
to  make  sure  that  realistic  target  assessments  are  made, 
whether  this  approach  or  some  other  is  used.  Even  if  the 
user  does  not  make  the  assessment  himself,  he  can  at  least 
bring  pressure  to  bear  on  the  researcher  to  produce  his  own 
assessment  in  a form  which  makes  the  underlying  component 
assessments  explicit  and,  therefore,  subject  to  review 
(though  it  is  clearly  more  satisfactory  to  use  an  appraiser 
who  is  not  beholden  to  the  research  practitioner) . One 
objective  of  this  study  has  been  to  dispose  of  the  claim, 
previously  tenable,  that  no  operational  and  logically  sound 
way  of  appraising  total  error  exists  for  target  assessment 
and  that,  therefore,  no  attempt  needs  to  be  made. 

Though  it  is  perhaps  too  much  to  hope  that  the  research 
practitioner  will  carry  out  realistic  target  assessment  him- 
self (at  least  for  public  consumption) , he  will  surely  be 
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motivated  to  try  realistic  research  design  appraisal, 
particularly  if  he  ultimately  expects  his  research  estimates 
to  be  subjected  to  appraisal  (by  whatever  means)  of  their 
credibility.  In  this  way,  the  existence  of  at  least  one 
systematic  scheme  for  appraising  the  credibility  of  estimates 
could  conceivably  lead  to  dramatic  improvements  in  the 
practice  of  quantitative  research. 

Although  the  general  decomposed  variable  analysis  tech- 
nique and  the  decomposed  error  variant  of  it  show  good 
promise  in  marketing  and  survey  research  where  they  have 
most  frequently  been  applied,  even  there  they  are  at  a 
primitive  state  of  operational  development.  Other  researchers, 
notably  Professor  Charles  Mayer  of  York  University,  have 
been  working  on  the  critical  problem  of  how  to  make  reasonable, 
empirically  based  component  assessments  which  are  required 
by  these  techniques  or  others  with  the  same  objectives. 

Needless  to  say,  a great  deal  of  additional  work  needs 
to  be  done  generally  in  the  area  of  the  credibility  of 
estimates,  for  example,  in  ironing  out  operational  bugs  in 
specific  techniques  and  in  building  a solid  empirical  base 
upon  which  to  make  required  assessments. 
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